206 research outputs found
Advances in pre-processing and model generation for mass spectrometric data analysis
The analysis of complex signals as obtained by mass spectrometric measurements
is complicated and needs an appropriate representation of the data. Thereby
the kind of preprocessing, feature extraction as well as the used similarity measure
are of particular importance. Focusing on biomarker analysis and taking the
functional nature of the data into account this task is even more complicated.
A new mass spectrometry tailored data preprocessing is shown, discussed and analyzed in
a clinical proteom study compared to a standard setting
Multi-perspective embedding for non-metric time series classification
The interest in time series analysis is rapidly increasing, providing new challenges for machine learning. Over many decades, Dynamic Time Warping (DTW) is referred to as the de facto standard distance measure for time series and the tool of choice when analyzing such data. Nevertheless, DTW has two major drawbacks: (a) it is non-metric and therefore hard to handle by standard machine learning techniques, and (b) it is not well suited for multi-dimensional time series. For this purpose, we propose a multi-perspective embedding of the time series into a complex-valued vector space and the evaluation by a model that is able to handle complex-valued data. The approach is evaluated on various multi-dimensional time series data and with different classifier techniques
Data-Driven Supervised Learning for Life Science Data
Life science data are often encoded in a non-standard way by means of alpha-numeric sequences, graph representations, numerical vectors of variable length, or other formats. Domain-specific or data-driven similarity measures like alignment functions have been employed with great success. The vast majority of more complex data analysis algorithms require fixed-length vectorial input data, asking for substantial preprocessing of life science data. Data-driven measures are widely ignored in favor of simple encodings. These preprocessing steps are not always easy to perform nor particularly effective, with a potential loss of information and interpretability. We present some strategies and concepts of how to employ data-driven similarity measures in the life science context and other complex biological systems. In particular, we show how to use data-driven similarity measures effectively in standard learning algorithms
Complex-valued embeddings of generic proximity data
Proximities are at the heart of almost all machine learning methods. If the
input data are given as numerical vectors of equal lengths, euclidean distance,
or a Hilbertian inner product is frequently used in modeling algorithms. In a
more generic view, objects are compared by a (symmetric) similarity or
dissimilarity measure, which may not obey particular mathematical properties.
This renders many machine learning methods invalid, leading to convergence
problems and the loss of guarantees, like generalization bounds. In many cases,
the preferred dissimilarity measure is not metric, like the earth mover
distance, or the similarity measure may not be a simple inner product in a
Hilbert space but in its generalization a Krein space. If the input data are
non-vectorial, like text sequences, proximity-based learning is used or ngram
embedding techniques can be applied. Standard embeddings lead to the desired
fixed-length vector encoding, but are costly and have substantial limitations
in preserving the original data's full information. As an information
preserving alternative, we propose a complex-valued vector embedding of
proximity data. This allows suitable machine learning algorithms to use these
fixed-length, complex-valued vectors for further processing. The complex-valued
data can serve as an input to complex-valued machine learning algorithms. In
particular, we address supervised learning and use extensions of
prototype-based learning. The proposed approach is evaluated on a variety of
standard benchmarks and shows strong performance compared to traditional
techniques in processing non-metric or non-psd proximity data.Comment: Proximity learning, embedding, complex values, complex-valued
embedding, learning vector quantizatio
- …